Column

Map

Column

What is this?

You’re looking all US Census tracts in California with domestic well completion reports.

Colors correpsond to the difference from median Californian household income ($57,792): red is lower, and blue is higher.

Adjust the layers to view median household income across the entire state.

Background

More domestic wells are completed in California than any other type of well.

Domestic wells tend to be screened closer to land surface, and run the risk of drying out before deeper agricultural and public supply wells, as we saw in the most recent drought.

In late 2017, California made an online state well completion report database available to the public, but the database suffers from missing values, errors from human entry, and is quite sizable at roughly ~1 million observations of 46 variables, or 46,000,000 elements.

Others have shown that disadvantaged unincorporated communities (DUCs), primarily in the southern Central Valley are at risk of unsafe drinking water, that there are racial and ethnic disparities in access to safe drinking water, and that safe drinking water is often nearby.

In this study, I consider domestic well completion in the entire state of California, and show that communities with both high and low socioeconomic status are affected by drought and the need to complete new wells, and that those communities vary greatly in terms of household income.

This suggests that past and future groundwater shortages in the state of California affect and will continue to affect low income communities that often lack the access to the political and financial capital necessary to alleviate the threats imposed by water scarcity.

Summary

What can we learn from exploring this map?

  • Droughts and water shortage affect both high-SES and low-SES tracts.

  • However, low-SES tracts will be less financially equipped to deal with groundwater shortages and the costs of completing a new well.

  • Low-SES tracts tend to be larger in area and geographically isolated from urban centers. When water shortages occur, these tracts are are less likely to be within range of a public water system.

About

This project grows out of my fascination with groundwater, disadvantaged communities, and data science.

It is tmy goal to apply machine learning to this and other data to predict drought vulnerability and assess which communities should be monitored and assisted before extreme water shortage events like the 2016 drought.

I would thank the California Department of Water Resources for making this data publically accessible, and Kyle Walker, for the tidycensus R package.


About the Author:

Rich Pauloo
personal website


Percent difference per zone is calculated as:

Where a is the household income of tract i, and b is the median household income for California. The median household income is used as a benchmark for percent difference because the distribution of incomes is left-skewed by a small handful of high income zones, shown on the first tab.

---
title: "Assessing Drought Vulnerability in California with Well Completion Data"
output: 
  flexdashboard::flex_dashboard:
    orientation: columns
    #vertical_layout: fill
    theme: cosmo
    source_code: embed
    social: menu

---

```{r setup, include=FALSE}
library(flexdashboard)
```


Column {data-width=700}
-----------------------------------------------------------------------

### Map

```{r, echo = FALSE, warning=FALSE, message=FALSE}
library(tidycensus)
#knitr::opts_chunk$set(out.width = "700px")
knitr::opts_chunk$set(cache=TRUE)
#census_api_key("32d5bcb711a8d0661adfe4a479f8f31f8b7335f0", install = TRUE, overwrite = TRUE)
```


```{r, echp = FALSE, warning=FALSE, message=FALSE}
# Load packages and data, download census data, and do some light cleaning.
library(leaflet) 
library(sf) # for spatial transforms
library(viridis) # for color palettes
library(tidyverse)
library(here) 
library(feather) # for fast reading/writing of dataframes
library(scales) # for dollar scale

# load preprocessed data for faster knitting speed
gwb_domestic_tracts <- read_feather("gwb_domestic_tracts.feather")
perc_diff_lab <- readRDS("perc_diff_lab.rds")
med_hhi_lab <- readRDS("med_hhi_lab.rds")

# input census api key and store it for later use
#census_api_key("", install = TRUE, overwrite = TRUE)
readRenviron("~/.Renviron")
options(tigris_use_cache = TRUE)

# get county level household income for CA
ca1 <- get_acs(geography = "county", 
               variables = c(hhincome = "B19013_001"), 
               state = "CA", 
               year = 2016, 
               geometry = TRUE) %>%
  st_transform(4326)

# tract level household income for CA
ca2 <- get_acs(geography = "tract", 
               variables = c(hhincome = "B19013_001"), 
               state = "CA", 
               year = 2016, 
               geometry = TRUE) %>%
  st_transform(4326)

# remove unnecessary columns and rename variables
ca2 %>% 
  select(NAME, estimate, moe, geometry) %>% 
  rename(name = NAME, hh_income = estimate) -> ca_2
```


```{r, eval = FALSE, echo = FALSE, warning=FALSE, message=FALSE, fig.width=10}
# And now it's time to create the visualization.

# read in gwb_pts
gwb_pts <- read_feather(here("R_intersection_gwb_pts.feather"))
gwb_pts <- st_as_sf(gwb_pts,
                    coords = c("lon", "lat"), # for point data
                    remove = F,
                    crs = 4326) # don't remove these lat/lon cols from df

# define projection
gwb_pts <- gwb_pts %>% sf::st_transform(4326) 

# get all domestic wells completed in 1977 and 1991 from gwb_pts sf object (in publish.Rmd)
gwb_domestic <- gwb_pts %>% filter(type == "domestic") # & year %in% c(1977,1991))

# perform intersection: join points to polygons, and filter points that fall outside of polygons
gwb_domestic_tracts <- st_intersection(gwb_domestic, ca_2)

# save
# gwb_domestic_tracts_2 <- gwb_domestic_tracts
# st_geometry(gwb_domestic_tracts_2) <- NULL 
#write_feather(gwb_domestic_tracts, "gwb_domestic_tracts.feather")
```

```{r, echo = FALSE, warning=FALSE, message=FALSE}
# gwb_domestic_tracts <- st_as_sf(gwb_domestic_tracts,
#                                 coords = c("lon", "lat"), # for point data
#                                 remove = F,
#                                 crs = 4326) # don't remove these lat/lon cols from df)

# get average median hh_income for ca
ca_median_hh_income <- gwb_domestic_tracts %>% pull(hh_income) %>% median(na.rm = TRUE)

# calculate the percent difference 
gwb_domestic_tracts %>% 
  group_by(name, hh_income, moe) %>% 
  summarise(count = n()) %>% 
  arrange(-count) %>% 
  mutate(perc_diff = ((hh_income - ca_median_hh_income) / (.5*(hh_income + ca_median_hh_income)) ) * 100) -> test

# remove geometry for the join with ploygons
#st_geometry(test) <- NULL 

# plot tracts and fill with perc_diff
left_join(ca_2, test, by = "name") -> temp

# define palette and bins
co = c("#b2182b", "#d6604d", "#f4a582", "#fddbc7", "#d1e5f0", "#92c5de", "#4393c3", "#2166ac")
bins = c("75 to 100%", "50 to 75%", "25 to 50%", "0 to 25%", "0 to -25%", "-25 to -50%", "-50 to -75%", "- 75 to -100%")

# cut continuous data into breaks and color those breaks
temp %>% 
  mutate(brks = cut(perc_diff, 
         breaks = c(-100,-75,-50,-25,0,25,50,75,100),
         labels = c("- 75 to -100%", "-50 to -75%", "-25 to -50%",
                    "0 to -25%","0 to 25%","25 to 50%", "50 to 75%", 
                    "75 to 100%"))) -> temp_cut
# temp_cut %>% 
#   ggplot() + geom_sf(aes(fill = brks), lwd = 0) +
#   scale_fill_manual(values = co, breaks = bins, na.value="white") -> p_perc_diff

#p_perc_diff
```

```{r, eval = FALSE}
# polygon labels
# create HTML labels for each polygon layer
perc_diff_lab <- lapply(seq(nrow(temp_cut)), function(i) {
  paste0( '

', round(as.numeric(temp_cut[i, "perc_diff"]),2), "%", '
', temp_cut[i, "name"], '
', 'hh_income: $',round(as.numeric(temp_cut[i, "hh_income.x"],2)), '
', 'mo_error: $', temp_cut[i, "moe.x"], '
', temp_cut[i, "count"], " wells completed", '

' ) }) %>% lapply(`[[`, 1) med_hhi_lab <- lapply(seq(nrow(temp_cut)), function(i) { paste0( '

', temp_cut[i, "name"], '
', '

', 'hh_income: $',round(as.numeric(temp_cut[i, "hh_income.x"],2)), '
', 'mo_error: $', temp_cut[i, "moe.x"], '

' ) }) %>% lapply(`[[`, 1) # save for faster loading while knitting saveRDS(perc_diff_lab, "perc_diff_lab.rds") saveRDS(med_hhi_lab, "med_hhi_lab.rds") ``` ```{r, warning=FALSE, message=FALSE} # make into leaflet # palettes pal_1 = colorBin(palette = co, domain = temp_cut$perc_diff, bins = c(-100,-75,-50,-25,0,25,50,75,100), na.color = "white") pal_2 <- colorBin("viridis", temp_cut$hh_income.x, bins = c(0,25000,50000,75000,100000,250001)) library(htmltools) library(htmlwidgets) temp_cut %>% leaflet(width = "100%") %>% addTiles() %>% addPolygons(stroke = FALSE, smoothFactor = 0.2, color = ~pal_1(perc_diff), #label = ~as.character(paste0(round(perc_diff,2)," %")), label = lapply(perc_diff_lab, htmltools::HTML), fillOpacity = 0.8, group = "% Diff from Med_HH Income") %>% addPolygons(stroke = FALSE, smoothFactor = 0.2, color = ~pal_2(hh_income.x), #label = ~as.character(paste0("$ ",hh_income.x)), label = lapply(med_hhi_lab, htmltools::HTML), fillOpacity = 0.8, group = "Med_HH Income") %>% addMarkers(data = gwb_domestic_tracts, lng = ~lon, lat = ~lat, popup = paste("Longitude:", gwb_domestic_tracts$lon, "
", "Latitude:", gwb_domestic_tracts$lat, "
", # "Perforated Interval Thickness:", gwb_domestic_tracts$b, "ft.", "
", "Top of Perforated Interval:", gwb_domestic_tracts$top, "ft.", "
", "Bottom of Perforated Interval:", gwb_domestic_tracts$bot, "ft."), group = "wells", clusterOptions = markerClusterOptions()) %>% addProviderTiles(providers$CartoDB.Positron) %>% addLayersControl(overlayGroups = c("% Diff from Med_HH Income", "Med_HH Income", "wells"), position = "topleft") %>% addLegend("topright", colors = rev(c("#b2182b", "#d6604d", "#f4a582", "#fddbc7", "#d1e5f0", "#92c5de", "#4393c3", "#2166ac")), labels = c("75 to 100%", "50 to 75%", "25 to 50%", "0 to 25%", "0 to -25%", "-25 to -50%", "-50 to -75%", "-75 to -100%"), title = "% diff from median hh_income", # legend displays time frame and depth class as title labFormat = labelFormat(suffix = " %"), opacity = 1) %>% hideGroup("Med_HH Income") %>% addMiniMap(position = "bottomleft") %>% addEasyButton(easyButton( icon="fa-globe", title="Zoom to CA", onClick=JS("function(btn, map){ map.setZoom(6); }"))) %>% addEasyButton(easyButton( icon="fa-crosshairs", title="Locate Me", onClick=JS("function(btn, map){ map.locate({setView: true}); }"))) -> socio_leaflet # htmlwidgets::saveWidget(socio_leaflet, file = "socio_leaflet.html", selfcontained = FALSE, title = "Socioeconomic disparity in California domestic well completion") socio_leaflet ``` Column {data-width=300 .tabset .tabset-fade} ----------------------------------------------------------------------- ### What is this? You're looking all US Census tracts in California with **domestic** well completion reports. Colors correpsond to the difference from median Californian household income ($57,792): red is lower, and blue is higher. Adjust the layers to view median household income across the entire state. ```{r, warning=FALSE, message=FALSE} x <- median(gwb_domestic_tracts$hh_income, na.rm = T) gwb_domestic_tracts %>% filter(hh_income <= x) -> low_ses gwb_domestic_tracts %>% ggplot(aes(x = hh_income)) + stat_density() -> p p <- ggplot_build(p) p$data %>% as.data.frame() %>% select(x, density) %>% mutate(ses = ifelse(x <= 57792, "low SES", "high SES")) -> p p %>% ggplot() + geom_line(aes(x, density), color= "black", size = 1) + geom_area(aes(x, density, fill = ses)) + geom_segment(aes(x = 57792, xend = 57792, y = 0, yend = 1.95e-05), lty = "dashed", color = "black", size = 1) + geom_point(aes(x = 57792, y = 2.0e-05), size = 2) + annotate("text", x = 110000, y = 2.1e-05, label = "median CA household income") + labs(title = "Household income in California", subtitle = "2016 US Census: tract-level data", y = "Density", x = "Household Income (USD)", fill = "") + scale_x_continuous(labels = dollar) + scale_fill_manual(breaks = c("low SES", "high SES"), labels = c("low SES", "high SES"), values = c("#92c5de","#f4a582")) + cowplot::theme_cowplot() + theme(plot.title = element_text(hjust = 0), legend.position = c(.75, .55)) ``` ### Background More **domestic wells** are completed in California than any other type of well. Domestic wells tend to be screened closer to land surface, and run the risk of drying out before deeper agricultural and public supply wells, as we saw in [the most recent drought](http://www.latimes.com/local/drought/la-me-east-porterville-drought-20160506-story.html). In late 2017, California made an online state well completion report database available to the public, but the database suffers from missing values, errors from human entry, and is quite sizable at roughly ~1 million observations of 46 variables, or **46,000,000 elements**. [Others](https://regionalchange.ucdavis.edu/sites/g/files/dgvnsk986/files/inline-files/The%20Struggle%20for%20Water%20Justice%20EXEC%20SUMM_1.pdf) have shown that disadvantaged unincorporated communities (DUCs), primarily in the southern Central Valley are at risk of unsafe drinking water, that there are racial and ethnic disparities in access to safe drinking water, and that safe drinking water is often nearby. In this study, I consider domestic well completion in the entire state of California, and show that communities with both high and low socioeconomic status are affected by drought and the need to complete new wells, and that those communities vary greatly in terms of household income. This suggests that past and future groundwater shortages in the state of California affect and will continue to affect low income communities that often lack the access to the political and financial capital necessary to alleviate the threats imposed by water scarcity. ### Summary What can we learn from exploring this map? * Droughts and water shortage affect both high-SES and low-SES tracts. * However, low-SES tracts will be less financially equipped to deal with groundwater shortages and the costs of completing a new well. * Low-SES tracts tend to be larger in area and geographically isolated from urban centers. When water shortages occur, these tracts are are less likely to be within range of a public water system. ### About This project grows out of my fascination with groundwater, disadvantaged communities, and data science. It is tmy goal to apply machine learning to this and other data to predict drought vulnerability and assess which communities should be monitored and assisted before extreme water shortage events like the 2016 drought. I would thank the California Department of Water Resources for making this data publically accessible, and [Kyle Walker](https://walkerke.github.io/), for the tidycensus R package. *** **About the Author**: Rich Pauloo [personal website](richpauloo.github.io) *** Percent difference per zone is calculated as: ![](eq.png){width=180px} Where *a* is the household income of tract *i*, and *b* is the median household income for California. The median household income is used as a benchmark for percent difference because the distribution of incomes is left-skewed by a small handful of high income zones, shown on the first tab.